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Abstract. - We study co-evolutionary Prisoner's Dilemma games where each player can imitate 
both the strategy and imitation rule from a randomly chosen neighbor with a probability depen- 
dent on the payoff difference when the player's income is collected from games with the neighbors. 
The players, located on the sites of a two-dimensional lattice, follow unconditional cooperation 
or defection and use individual strategy adoption rule described by a parameter. If the system 
is started from a random initial state then the present co-evolutionary rule drives the system to- 
wards a state where only one evolutionary rule remains alive even in the coexistence of cooperative 
and defective behaviors. The final rule is related to the optimum providing the highest level of 
cooperation and affected by the topology of the connectivity structure. 



The evolutionary game theory provides a general math- 
ematical framework for the investigation of multi-agent 
systems used widely in biology, economy and other so- 
cial sciences [1-3] . In these systems we have an extremely 
large freedom in the definition of models giving the set of 
strategies (states or species), the interaction (payoff ma- 
trix), the connectivity structure (varied from lattice to 
scale- free network), and dynamical rules. Due to the large 
number of possibilities the complete exploration of these 
systems requires a long time because we need to deter- 
mine separately the effect of all the mentioned ingredients 
of the model on the system behavior. To overcome this 
difficulty, the introduction of dynamical rules when not 
only the strategy changes but also other individual feature 
of players [4] may reveal the relevant region of parameter 
space that is important to study. During simultaneous 
evolutions (briefly co-evolution) of variables the success- 
driven Darwinian selection can serve as a general tool to 
identify the characteristic dynamical rules. The goal of 
this letter to demonstrate that the fixation of a crucial 
parameter is possible and the resulting value is in close 
connection to the state where cooperation is the largest 
that can be achieved at the corresponding payoff elements 
and topology. 

The systematic investigation of the evolutionary Pris- 
oner's Dilemma (PD) games has attracted a considerable 
effort in the last decades because these models can de- 
scribe the ways how the cooperative behavior is main- 
tained among selfish individuals [5] . Originally the PD is 



a two-person one-shot game [2, 3] where the players have 
two options (cooperation or defection) to choose and their 
income depends on their choices. The rank of possible pay- 
offs enforces both (intelligent and selfish or rational) play- 
ers to choose defection yielding the second worst income 
for each while the mutual cooperation provides higher in- 
come for both players. The situation is changed drasti- 
cally in the multi-agent systems where the player's income 
comes from repeated games with the neighbors defined 
by a connectivity structure (lattice with nearest neighbor 
connections or other graphs). The evolutionary games are 
the combination of the multi-agent repeated games and 
Darwinian selection. Namely, sometimes the players are 
allowed to modify their strategy by imitating one of the 
more successful neighbors (in biological context: an off- 
spring of the more successful species will be substituted 
for a less successful one). 

It turned out that the cooperative behavior can be sus- 
tained among the spatially arranged players with local in- 
teractions [6] for a wide range of evolutionary (here imi- 
tation) rules even if they can follow only one of the two 
simplest strategies: unconditional cooperation or defec- 
tion. Subsequent investigations have clarified the main ef- 
fect of payoffs, connectivity structure (including networks 
with inhomogeneous degree distribution) , and noise on the 
level of cooperation (for a survey see [3,7]). These investi- 
gations highlighted a mechanism supporting the coopera- 
tion efficiently in more realistic models where the number 
of neighbors varies within a wide region [8,9] or the indi- 
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viduals have different personal strategy pass capability to 
help the imitation of their own strategy [10,11]. Evidently, 
the enhancement of the strategy set (e.g., the application 
of stochastic reactive strategies [12], deterministic strate- 
gies of finite memory [13], and Q-learning strategies [14]) 
opened further dimensions towards the ways supporting 
cooperation. 

The simultaneous evolution (henceforth co-evolution) of 
strategies and another feature of the model was investi- 
gated previously by many authors. First the co-evolution 
of strategy distribution and connectivity structure was 
studied (for examples see [15-21]). The co-evolution of 
the strategy distribution and inhomogeneous capability of 
strategy transfer was also investigated in the last years 
[22,23]. In another model the individuals were allowed to 
have different payoff matrices that can be adopted (imi- 
tated) together with the strategy, too [24,25]. Very re- 
cently, van Segbroeck et al. [26, 27] have introduced a 
co-evolutionary PD game where the players are capable 
to modify their connection in different ways and in par- 
allel with the strategy adoption they can also imitate 
the neighbor's method used later in the rearrangement 
of their own neighborhood. Finally we have to mention 
that the co-evolution of strategy and individual learning 
(evolutionary) rule was investigated previously for some 
cases [28-30]. For example, Moyano and Sanchez [31] have 
studied the cases when the players adopt the strategy and 
dynamical rule from the better player if two strategies and 
two rules are allowed. 

Now we extend a previous model [32] to study what 
happens when the players can adopt not only the more 
prosperous strategy but the way of strategy adoption as 
well. The present set of strategy imitation rules is based 
on pairwise comparison of payoffs between two neighbor- 
ing players chosen at random. We assume that initially the 
players use different rules giving the probability of strategy 
adoption as a function of payoff difference divided by an 
individual parameter resembling the temperature in the 
Fermi-Dirac distribution function. It will be shown that 
the suggested co-evolutionary process drives the system 
towards a final state where all the players use the same im- 
itation rule even if their strategies are different. The state 
characterized by the fixed selection (learning) rule is close 
to the highest cooperativity (optimum) state that can be 
achieved applying the corresponding payoff elements and 
topology. As the optimum level of cooperation depends 
on the connectivity structure [32, 33] therefore our inves- 
tigation is performed on both the square and kagome lat- 
tices representing two different classes of behaviors. These 
systems will be investigated by Monte Carlo (MC) simula- 
tions and an extended version of the dynamical mean- field 
approximation. 

In the present model the players located on the sites x of 
a two dimensional lattice can follow either unconditional 
cooperation or defection strategies, in short, s x = C or 
D. The players' income (P x ) come from one-shot games 
with the four nearest neighbors. Following the suggestion 



of Nowak and May [34] we use a re-scaled payoff matrix of 
the so-called weak PD game, i. e., the cooperative player 
receives 1 or if the co-player follows C or D strategies 
while the defective player is rewarded by b (1 < b < 2) or 
if the opponent cooperates or defects. Initially, each player 
follows a strategy (s x — C or D) chosen at random. Be- 
sides it we assume that the players use different imitation 
rules characterized by a parameter K x chosen randomly 
from a set of possible values {K\, . . . , K n } (as it will be 
detailed later on). In each subsequent elementary step of 
the evolutionary process we choose two neighboring play- 
ers (x and y) at random, we determine their payoff P x and 
P y , and player x adopts the strategy s y and imitation rule 
(characterized by K y ) with a probability 



l + exp[{P x -P v )/K x ] 

in two (independent) consecutive processes. More pre- 
cisely, we generate two random numbers (0 < r\,r% < 1), 
and s x — * s y if rj < W and K x — > K y if r 2 < W. This 
means that probably both the strategy and imitation rule 
are adopted if P y — P x ^S> K x . Evidently, there exist el- 
ementary steps when either s y or K y or none is adopted. 
As a consequence of independent processes the imitation 
of the imitation rule is possible even if the strategics arc 
the same (s x = s y ). These dynamical rules imply the ex- 
istence of absorbing states with uniform strategies and/or 
rules where the evolution is stopped separately. We should 
note, however, that qualitatively similar results were ob- 
served when imitation of rules was only possible if players 
have different strategies. 

The individual parameter K x of player x can be inter- 
preted in different ways [7,35]. On the one hand we can 
think that in realistic systems the payoff matrix describes 
the average payoff and the current payoffs should be mod- 
ified by a stochastic term as it is modelled by Perc [36] 
and Traulsen et al. [37] . The noisy term can be caused by 
the fluctuating environment, cognitive mistakes, etc. For 
a suitable probability distribution of the stochastic con- 
tribution, the deterministic imitation of the better player 
can yield a strategy adoption rule similar to those given 
by ©■ In that case K x characterizes the amplitude of 
noise. On the other hand, the personal decision of players 
can also involve stochastic elements reflecting their free- 
dom to not accept the better strategy or even to follow 
the worse one (for the latter interpretation K x denotes 
the average amount of payoff what player x hazards when 
looking for a better solution). 

The evolutionary process is governed by repeating the 
mentioned elementary steps that drive the system towards 
a final state described by the average portion p of coop- 
erators and the distribution of K x . If initially the players 
use a uniform rule {K x = K, for Vx) then this system be- 
comes equivalent to those studied previously [32]. In that 
case we can distinguish three regions of b dependent on 
K. lib < b c i(K) then only cooperators remain alive after 
a transient period. On the contrary, only defectors will 
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survive in the final state when 6 > b C 2(K). Within the 
intermediate region [b c i(K) < b < b c2 (K)\ the stationary 
value of p decreases from 1 to if b is increased. 

First we study the present model on the square lattice 
where both b c i(K) and b C 2(K) goes to 1 if K tends to ei- 
ther zero or infinity. Besides it, there exists an optimum 
value of K where b c2 (K) reaches its maximum. A one- 
peak profile (for an example see Fig. 31 in [7]) can be ob- 
served when evaluating p as a function of the homogeneous 
K for fixed b if 1 < b < max[b C 2(K)]. In the latter case 
we can introduce two threshold values of K in a way that 
cooperators die out if K < K c \(b) or K > K c2 (b). Within 
the intermediate region of K \K c \{b) < K < K C 2(b)] the 
C and D strategies coexist for the given uniform rule. 

As the above investigations [32] have also indicated that 
the relaxation time diverges if K — > or oo therefore the 
undesired consequences of this effect was avoided by in- 
troducing additional constraints, namely, all Ki > -ftT m i n 
(typically K m i n — 0.001). On the other hand, several runs 
have justified that rules with high K x die out fast, there- 
fore the initial set of Ki has also been limited from above 
(typically K max = 2) and n is varied from 2 to 200 for 
sake of simplicity. 

Let us discuss the trivial situations when the play- 
ers have different K x parameters in the initial state but 
their value exceeds the second threshold value, that is 
K x > K c2 (b) for Mx. After some time only defectors re- 
main alive (s x = D) with a preference of lower Ki. When 
cooperator strategies become extinct all players receives 
the same payoff, P x — 0, and the further evolution of 
rules (K x ) can be well described by the voter model (for a 
survey see [38]) with a large number of candidates. This 
means that one can observe growing domains of players 
with the same rules and the typical domain size increases 
with the logarithm of time in the two-dimensional sys- 
tems. The same phenomenon is found if K x < K c \(b) 
for Mx as well as for the combination of the latter two 
cases when there is no K x within the intermediate region 
[K c i(b) < K x < K c2 (b)\ in the initial state. 

The final state of the co-evolutionary process changes 
drastically if initially there are several players with im- 
itation rules belonging to the coexistence region for the 
homogeneous cases, i.e., K c i(b) < K x < K c2 (b). The MC 
simulations have indicated clearly that after a relaxation 
period all the players use the same imitation rule. The 
Darwinian selection chooses the rule Ki £ {K\, . . . , K n } 
that has the "minimum distance" from a fixation value 
Kf(b). The quotation mark refers to a possible asymme- 
try between the two sides, however, the estimation of its 
magnitude is prevented by the statistical error. Appar- 
ently the Darwinian selection favors a rule Ki providing 
the highest average payoff (as it occurs for population dy- 
namics) here, however, the value of Kf{b) does not co- 
incide the values of K exhibiting local maximum in p or 
average payoff (in general the difference between the latter 
two quantities is smaller than our statistical error compa- 
rable to symbol size). Figure Q] demonstrates the fixation 
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Fig. 1: The MC results on the square lattice for the fixation 
values are denoted by closed squares within the coexistence 
region bounded by the solid line (fica{K)). Open circles show 
the position of local maximum in the p portion of cooperators. 
Dotted lines are just to guide the eye. 

values within the coexistence region and also the position 
of local maximum of p used frequently to quantify the co- 
operativity in the whole society. 

Naturally, the fixation time depends on the system size 
L, the initial set of Ki values, and also the number (n) of 
different values. It turned out that for sufficiently large 
system sizes (200 < L) the selected rule becomes indepen- 
dent of the initial configuration and sequence of random 
numbers. The efficiency of the accurate determination of 
Kf(b) could be improved significantly if only two rules 
were allowed in the initial state as detailed below. 

As mentioned above the topological feature of the 
connectivity structure influences the qualitative behav- 
ior (phase diagram) in the evolutionary PD games [32]. 
On the kagome lattice overlapping triangles support the 
spreading of cooperative behavior in the low noise limit. 
For homogeneous imitation rules the upper boundary of 
the coexistence region (b c2 (K)) decreases monotonously 
from 3/2 to 1 if if increases from to oo [33]. This be- 
havior implies the possibility that here the Darwinian se- 
lection of rules (within the coexistence region) favors the 
lowest values of Ki referring to Kf(b) = 0. This behavior 
has indeed been justified by MC simulations if b exceeds 
a threshold value (b > 6th = 1.182(2)). For low values of 
6 we have found a behavior resembling those observed on 
square lattice. 

Figure [5] shows the /^-dependence of p (for homoge- 
neous rules K x = K if 6 = 1.17) in a magnified plot to 
emphasize the existence of two local maxima separated by 
a shallow local minimum. If the co-evolutionary system 
is started from a state with many rules inside the coexis- 
tence region then only one rule (the corresponding Kf(b) 
is denoted by the vertical dotted line in Fig. [3J) will remain 
alive in a way as described above. There exists, however, 
a relevant difference in the behaviors between the square 
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Fig. 2: Portion of cooperators as a function homogeneous 
K x = K for b = 1.17 on the kagome lattice. The MC re- 
sults are illustrated by a solid line because the statistical error 
is comparable to the line thickness. The horizontal arrows in- 
dicate that the system can evolve towards the fixation value 
Kf(b) (denoted by dotted vertical line) through weak muta- 
tions. At the same time the weak mutations drives the system 
towards a state where K x — ► if initially each K x is smaller 
than a threshold value denoted by the dashed vertical line. 



Fig. 3: Fixation values Kf(b) for the competing imitation rules 
on the kagome lattice are illustrated by closed squares. Open 
squares indicate the position of separatrix indicated by dashed 
line in Fig. [2] Solid line represents the maximum values of 
b where cooperators can survive for homogeneous rules. Open 
circles show the position of local maxima of p within the coexis- 
tence region. The insert compares the prediction (solid line) of 
an extended version of the three-site dynamical cluster method 
with the results (symbols) of MC simulations. 



and kagome lattices. Namely, on the kagome lattice two 
attractors (final imitation rules) can be observed. The 
horizontal arrows in Fig. [5] illustrate the direction of pref- 
erence if initially the players follow rules from the marked 
intervals. The result of these types of investigations can 
be interpreted as the direction of evolution in K x through 
rare and weak mutations. Although the state of K x = 
Vx has a finite basin of attraction through a weak muta- 
tion this state is overcome by the offspring of players of 
Kf(b) being present initially. 

The MC results for arbitrary values of b are summa- 
rized in Fig. [3] where the cases of Kj(b) ~ are denoted 
by several closed squares positioned at K mm (instead of 
0) used to avoid the above mentioned technical difficul- 
ties. In these cases the MC simulations have indicated 
a plateau (within the statistical error) in the values of p 
and average payoff. If the value of b is decreased gradually 
then an abrupt change of Kf(b) is found at b — bth- Below 
this threshold value there appears a positive Kf{b) that 
can also be related to the local maxima in the portion of 
cooperators. Notice that Kf(b) correlates weakly with the 
position of the second (right) local maximum of p (see Fig. 
[2]) if b < b t h- The height of the second local maximum de- 
creases monotonously if b is increased and this local peak 
vanishes above a value larger than b t h- 

As mentioned, the selected rule (Kf(b)) can be deter- 
mined more efficiently if we consider the competitions be- 
tween only two suitable imitation rules. This approach 
can also be utilized in the extended version of dynami- 
cal cluster techniques (for a brief survey see [7]) where 
we derive a set of equations of motion for the probabil- 



ity of each (strategy and rule) configuration existing on a 
given cluster of sites. The accuracy of this method can 
be improved by choosing larger clusters. Previous investi- 
gations [32] have justified that the three-site (triangular) 
cluster of the kagome lattice is the smallest one that gives 
adequate description about all the relevant features for 
homogeneous rules. This fact has raised the possibility to 
extend this technique for the two-rule cases. The details 
of this method will be published elsewhere, now we only 
compare its prediction with the MC results in the insert of 
Fig. [3l Noteworthy that this method predicts a little bit 
higher threshold value for the payoff parameter 6, namely, 
&2 = 1.219(2), and confirms the difference between the 
selected rule and local maxima both in p and average pay- 
off. 

In summary, the Darwinian selection (imitation of the 
better) proved to be beneficial for the whole society for 
the Prisoner's Dilemma if not only the strategy but also 
the way of strategy adoption is adopted from a successful 
neighboring player. The systematic investigations high- 
light the relevance of the selected dynamical rules that, 
depending on the connectivity structure and payoff, pro- 
vides the highest or almost the highest possible average in- 
come. The small difference between the selected and the 
optimal dynamical rules might have been related to the 
spatial effects enhancing the importance of fluctuations. 

* * * 
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